LIVELINET: A Multimodal Deep Recurrent Neural Network to Predict Liveliness in Educational Videos

نویسندگان

Arjun Sharma

Arijit Biswas

Ankit Gandhi

Sonal Patil

Om Deshmukh

چکیده

Online educational videos have emerged as one of the most popular modes of learning in the recent years. Studies have shown that liveliness is highly correlated to engagement in educational videos. While previous work has focused on feature engineering to estimate liveliness and that too using only the acoustic information, in this paper we propose a technique called LIVELINET that combines audio and visual information to predict liveliness. First, a convolutional neural network is used to predict the visual setup, which in turn identifies the modalities (visual and/or audio) to be used for liveliness prediction. Second, we propose a novel method that uses multimodal deep recurrent neural networks to automatically estimate if an educational video is lively or not. On the StyleX dataset of 450 one-minute long educational video snippets, our approach shows an relative improvement of 7.6% and 1.9% compared to a multimodal baseline and a deep network baseline using only the audio information respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

The Optimization of Forecasting ATMs Cash Demand of Iran Banking Network Using LSTM Deep Recursive Neural Network

One of the problems of the banking system is cash demand forecasting for ATMs (Automated Teller Machine). The correct prediction can lead to the profitability of the banking system for the following reasons and it will satisfy the customers of this banking system. Accuracy in this prediction are the main goal of this research. If an ATM faces a shortage of cash, it will face the decline of bank...

متن کامل

MEMN: Multimodal Emotional Memory Network for Emotion Recognition in Dyadic Conversational Videos

Multimodal emotion recognition is a developing field of research which aims at detecting emotions in videos. For conversational videos, current methods mostly ignore the role of inter-speaker dependency relations while classifying emotions. In this paper, we address recognizing utterance-level emotions in dyadic conversations. We propose a deep neural framework, termed Multimodal Emotional Memo...

متن کامل

Detecting Interrogative Utterances with Recurrent Neural Networks

In this paper, we explore different neural network architectures that can predict if a speaker of a given utterance is asking a question or making a statement. We compare the outcomes of regularization methods that are popularly used to train deep neural networks and study how different context functions can affect the classification performance. We also compare the efficacy of gated activation...

متن کامل

Action Recognition with Joint Attention on Multi-Level Deep Features

We propose a novel deep supervised neural network for the task of action recognition in videos, which implicitly takes advantage of visual tracking and shares the robustness of both deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). In our method, a multi-branch model is proposed to suppress noise from background jitters. Specifically, we firstly extract multi-level dee...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

LIVELINET: A Multimodal Deep Recurrent Neural Network to Predict Liveliness in Educational Videos

نویسندگان

چکیده

منابع مشابه

Speech Emotion Recognition Using Scalogram Based Deep Structure

The Optimization of Forecasting ATMs Cash Demand of Iran Banking Network Using LSTM Deep Recursive Neural Network

MEMN: Multimodal Emotional Memory Network for Emotion Recognition in Dyadic Conversational Videos

Detecting Interrogative Utterances with Recurrent Neural Networks

Action Recognition with Joint Attention on Multi-Level Deep Features

عنوان ژورنال:

اشتراک گذاری